Squeakr: an exact and approximate k-mer counting system.
Identifieur interne : 000804 ( Main/Exploration ); précédent : 000803; suivant : 000805Squeakr: an exact and approximate k-mer counting system.
Auteurs : Prashant Pandey [États-Unis] ; Michael A. Bender [États-Unis] ; Rob Johnson [États-Unis] ; Rob Patro [États-Unis] ; Bonnie BergerSource :
- Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2018.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- genetics : Eukaryota.
- methods : Gene Expression Profiling, Genomics, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, Sequence Analysis, RNA.
- Algorithms, Animals, Genome, Humans, Software.
Abstract
k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.
DOI: 10.1093/bioinformatics/btx636
PubMed: 29444235
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 000996
- to stream PubMed, to step Curation: 000996
- to stream PubMed, to step Checkpoint: 000763
- to stream Ncbi, to step Merge: 001D35
- to stream Ncbi, to step Curation: 001D35
- to stream Ncbi, to step Checkpoint: 001D35
- to stream Main, to step Merge: 000807
- to stream Main, to step Curation: 000804
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Squeakr: an exact and approximate k-mer counting system.</title>
<author><name sortKey="Pandey, Prashant" sort="Pandey, Prashant" uniqKey="Pandey P" first="Prashant" last="Pandey">Prashant Pandey</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Bender, Michael A" sort="Bender, Michael A" uniqKey="Bender M" first="Michael A" last="Bender">Michael A. Bender</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Johnson, Rob" sort="Johnson, Rob" uniqKey="Johnson R" first="Rob" last="Johnson">Rob Johnson</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Patro, Rob" sort="Patro, Rob" uniqKey="Patro R" first="Rob" last="Patro">Rob Patro</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29444235</idno>
<idno type="pmid">29444235</idno>
<idno type="doi">10.1093/bioinformatics/btx636</idno>
<idno type="wicri:Area/PubMed/Corpus">000996</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000996</idno>
<idno type="wicri:Area/PubMed/Curation">000996</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000996</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000763</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000763</idno>
<idno type="wicri:Area/Ncbi/Merge">001D35</idno>
<idno type="wicri:Area/Ncbi/Curation">001D35</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001D35</idno>
<idno type="wicri:Area/Main/Merge">000807</idno>
<idno type="wicri:Area/Main/Curation">000804</idno>
<idno type="wicri:Area/Main/Exploration">000804</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Squeakr: an exact and approximate k-mer counting system.</title>
<author><name sortKey="Pandey, Prashant" sort="Pandey, Prashant" uniqKey="Pandey P" first="Prashant" last="Pandey">Prashant Pandey</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Bender, Michael A" sort="Bender, Michael A" uniqKey="Bender M" first="Michael A" last="Bender">Michael A. Bender</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Johnson, Rob" sort="Johnson, Rob" uniqKey="Johnson R" first="Rob" last="Johnson">Rob Johnson</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Patro, Rob" sort="Patro, Rob" uniqKey="Patro R" first="Rob" last="Patro">Rob Patro</name>
<affiliation wicri:level="2"><nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Eukaryota (genetics)</term>
<term>Gene Expression Profiling (methods)</term>
<term>Genome</term>
<term>Genomics (methods)</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Sequence Analysis, RNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de profil d'expression de gènes ()</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Analyse de séquence d'ARN ()</term>
<term>Animaux</term>
<term>Eucaryotes (génétique)</term>
<term>Génome</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit ()</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Eukaryota</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Eucaryotes</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Gene Expression Profiling</term>
<term>Genomics</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Genome</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de profil d'expression de gènes</term>
<term>Analyse de séquence d'ADN</term>
<term>Analyse de séquence d'ARN</term>
<term>Animaux</term>
<term>Génome</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
</list>
<tree><noCountry><name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
</noCountry>
<country name="États-Unis"><region name="État de New York"><name sortKey="Pandey, Prashant" sort="Pandey, Prashant" uniqKey="Pandey P" first="Prashant" last="Pandey">Prashant Pandey</name>
</region>
<name sortKey="Bender, Michael A" sort="Bender, Michael A" uniqKey="Bender M" first="Michael A" last="Bender">Michael A. Bender</name>
<name sortKey="Johnson, Rob" sort="Johnson, Rob" uniqKey="Johnson R" first="Rob" last="Johnson">Rob Johnson</name>
<name sortKey="Patro, Rob" sort="Patro, Rob" uniqKey="Patro R" first="Rob" last="Patro">Rob Patro</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000804 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000804 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:29444235 |texte= Squeakr: an exact and approximate k-mer counting system. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:29444235" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |